13. Conclusions Using Groupby
Drawing Conclusions Using Groupby
In the notebook below, you're going to investigate two questions about this data using pandas' groupby function. Here are tips for answering each question:
Q1: Is a certain type of wine (red or white) associated with higher quality?
For this question, compare the average quality of red wine with the average quality of white wine with groupby. To do this group by color and then find the mean quality of each group.
Q2: What level of acidity (pH value) receives the highest average rating?
This question is more tricky because unlike
color
, which has clear categories you can group by (red and white)
pH
is a quantitative variable without clear categories. However, there is a simple fix to this. You can create a categorical variable from a quantitative variable by creating your own categories.
pandas' cut
function let's you "cut" data in groups. Using this, create a new column called
acidity_levels
with these categories:
Acidity Levels:
- High: Lowest 25% of pH values
- Moderately High: 25% - 50% of pH values
- Medium: 50% - 75% of pH values
- Low: 75% - max pH value
Here, the data is being split at the 25th, 50th, and 75th percentile. Remember, you can get these numbers with pandas'
describe()
! After you create these four categories, you'll be able to use groupby to get the mean quality rating for each acidity level.
Workspace
This section contains either a workspace (it can be a Jupyter Notebook workspace or an online code editor work space, etc.) and it cannot be automatically downloaded to be generated here. Please access the classroom with your account and manually download the workspace to your local machine. Note that for some courses, Udacity upload the workspace files onto https://github.com/udacity , so you may be able to download them there.
Workspace Information:
- Default file path:
- Workspace type: jupyter
- Opened files (when workspace is loaded): n/a